Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods

نویسندگان

  • Petros Drineas
  • Michael W. Mahoney
  • S. Muthukrishnan
چکیده

Much recent work in the theoretical computer science, linear algebra, and machine learning has considered matrix decompositions of the following form: given an m×n matrix A, decompose it as a product of three matrices, C, U , and R, where C consists of a small number of columns of A, R consists of a small number of rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of matrix “sketches”, speeding up kernel-based statistical learning, preserving sparsity in low-rank matrix representation, and improved interpretability of data analysis methods. Our main result is a randomized, polynomial algorithm which, given as input an m× n matrix A, returns as output matrices C,U,R such that ‖A−CUR‖F ≤ (1 + ) ‖A− Ak‖F with probability at least 1 − δ. Here, Ak is the “best” rank-k approximation (provided by truncating the Singular Value Decomposition of A), and ‖X‖F is the Frobenius norm of the matrix X. The number of columns in C and rows in R is a low-degree polynomial in k, 1/ , and log(1/δ). Our main result is obtained by an extension of our recent relative error approximation algorithm for 2 regression from overconstrained problems to general 2 regression problems. Our algorithm is simple, and it takes time of the order of the time needed to compute the top k right singular vectors of A. In addition, it samples the columns and rows of A via the method of “subspace sampling,” so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors, and since they ensure that we capture entirely a certain subspace of interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Column Subset Selection with Missing Data via Active Sampling

Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work...

متن کامل

ar X iv : 0 70 8 . 36 96 v 1 [ cs . D S ] 2 7 A ug 2 00 7 Relative - Error CUR Matrix Decompositions ∗

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly express...

متن کامل

Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods

Given an m×n matrix A and an integer k less than the rank of A, the “best” rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of ...

متن کامل

Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström a...

متن کامل

Improving CUR Matrix Decomposition and Nyström Approximation via Adaptive Sampling

The CUR matrix decomposition and Nyström method are two important low-rank matrix approximation techniques. The Nyström method approximates a positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, the CUR decomposition can be regarded as an extension of the Nyström method. In this p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006